Search | VHL Regional Portal

1.

Predictors of Inadequate Serum Urate Response to Low-Dose Febuxostat in Male Patients with Gout.

Sun, Wenyan; Zhao, Xuetong; Dalbeth, Nicola; Terkeltaub, Robert; Cui, Lingling; Liu, Zhen; Han, Lin; Wang, Can; Zhang, Hui; Bao, Yiming; Li, Changgui; Lu, Jie.

J Inflamm Res ; 17: 2657-2668, 2024.

Article in English | MEDLINE | ID: mdl-38707960

ABSTRACT

Objective: This study aimed to understand predictors of inadequate response (IR) to low-dose febuxostat treatment based on clinical variables. Methods: We pooled data from 340 patients of an observational cohort and two clinical trials who received febuxostat 20 mg/day for at least 3 months. IR was defined as failure to reach the target serum urate level (sUA<6 mg/dL) at any time point during 3 months treatment. The potential predictors associated with short- or mid-term febuxostat IR after pooling the three cohorts were explored using mixed-effect logistic analysis. Machine learning models were performed to evaluate the predictors for IR using the pooled data as the discovery set and validated in an external test set. Results: Of the 340 patients, 68.9% and 51.8% were non-responders to low-dose febuxostat during short- and mid-term follow-up, respectively. Serum urate and triglyceride (TG) levels were significantly associated with febuxostat IR, but were also selected as significant features by LASSO analysis combined with age, BMI, and C-reactive protein (CRP). These five features in combination, using the best-performing stochastic gradient descent classifier, achieved an area under the receiver operating characteristic curve of 0.873 (95% CI [0.763, 0.942]) and 0.706 (95% CI [0.636, 0.727]) in the internal and external test sets, respectively, to predict febuxostat IR. Conclusion: Response to low-dose febuxostat is associated with early sUA improvement in individual patients, as well as patient age, BMI, and levels of TG and CRP.

2.

Plant genomic resources at National Genomics Data Center: assisting in data-driven breeding applications.

Tian, Dongmei; Xu, Tianyi; Kang, Hailong; Luo, Hong; Wang, Yanqing; Chen, Meili; Li, Rujiao; Ma, Lina; Wang, Zhonghuang; Hao, Lili; Tang, Bixia; Zou, Dong; Xiao, Jingfa; Zhao, Wenming; Bao, Yiming; Zhang, Zhang; Song, Shuhui.

aBIOTECH ; 5(1): 94-106, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38576435

ABSTRACT

Genomic data serve as an invaluable resource for unraveling the intricacies of the higher plant systems, including the constituent elements within and among species. Through various efforts in genomic data archiving, integrative analysis and value-added curation, the National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), has successfully established and currently maintains a vast amount of database resources. This dedicated initiative of the NGDC facilitates a data-rich ecosystem that greatly strengthens and supports genomic research efforts. Here, we present a comprehensive overview of central repositories dedicated to archiving, presenting, and sharing plant omics data, introduce knowledgebases focused on variants or gene-based functional insights, highlight species-specific multiple omics database resources, and briefly review the online application tools. We intend that this review can be used as a guide map for plant researchers wishing to select effective data resources from the NGDC for their specific areas of study. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-023-00134-4.

3.

Unraveling the genetic variations underlying virulence disparities among SARS-CoV-2 strains across global regions: insights from Pakistan.

Jabeen, Momina; Shoukat, Shifa; Shireen, Huma; Bao, Yiming; Khan, Abbas; Abbasi, Amir Ali.

Virol J ; 21(1): 55, 2024 Mar 06.

Article in English | MEDLINE | ID: mdl-38449001

ABSTRACT

Over the course of the COVID-19 pandemic, several SARS-CoV-2 variants have emerged that may exhibit different etiological effects such as enhanced transmissibility and infectivity. However, genetic variations that reduce virulence and deteriorate viral fitness have not yet been thoroughly investigated. The present study sought to evaluate the effects of viral genetic makeup on COVID-19 epidemiology in Pakistan, where the infectivity and mortality rate was comparatively lower than other countries during the first pandemic wave. For this purpose, we focused on the comparative analyses of 7096 amino-acid long polyprotein pp1ab. Comparative sequence analysis of 203 SARS-CoV-2 genomes, sampled from Pakistan during the first wave of the pandemic revealed 179 amino acid substitutions in pp1ab. Within this set, 38 substitutions were identified within the Nsp3 region of the pp1ab polyprotein. Structural and biophysical analysis of proteins revealed that amino acid variations within Nsp3's macrodomains induced conformational changes and modified protein-ligand interactions, consequently diminishing the virulence and fitness of SARS-CoV-2. Additionally, the epistatic effects resulting from evolutionary substitutions in SARS-CoV-2 proteins may have unnoticed implications for reducing disease burden. In light of these findings, further characterization of such deleterious SARS-CoV-2 mutations will not only aid in identifying potential therapeutic targets but will also provide a roadmap for maintaining vigilance against the genetic variability of diverse SARS-CoV-2 strains circulating globally. Furthermore, these insights empower us to more effectively manage and respond to potential viral-based pandemic outbreaks of a similar nature in the future.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , Pakistan/epidemiology , Pandemics , Virulence/genetics , Amino Acids , Polyproteins , Genetic Variation

4.

CROST: a comprehensive repository of spatial transcriptomics.

Wang, Guoliang; Wu, Song; Xiong, Zhuang; Qu, Hongzhu; Fang, Xiangdong; Bao, Yiming.

Nucleic Acids Res ; 52(D1): D882-D890, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37791883

ABSTRACT

The development of spatial transcriptome sequencing technology has revolutionized our comprehension of complex tissues and propelled life and health sciences into an era of spatial omics. However, the current availability of databases for accessing and analyzing spatial transcriptomic data is limited. In response, we have established CROST (https://ngdc.cncb.ac.cn/crost), a comprehensive repository of spatial transcriptomics. CROST encompasses high-quality samples and houses 182 spatial transcriptomic datasets from diverse species, organs, and diseases, comprising 1033 sub-datasets and 48 043 tumor-related spatially variable genes (SVGs). Additionally, it encompasses a standardized spatial transcriptome data processing pipeline, integrates single-cell RNA sequencing deconvolution spatial transcriptomics data, and evaluates correlation, colocalization, intercellular communication, and biological function annotation analyses. Moreover, CROST integrates the transcriptome, epigenome, and genome to explore tumor-associated SVGs and provides a comprehensive understanding of their roles in cancer progression and prognosis. Furthermore, CROST provides two online tools, single-sample gene set enrichment analysis and SpatialAP, for users to annotate and analyze the uploaded spatial transcriptomics data. The user-friendly interface of CROST facilitates browsing, searching, analyzing, visualizing, and downloading desired information. Collectively, CROST offers fresh and comprehensive insights into tissue structure and a foundation for understanding multiple biological mechanisms in diseases, particularly in tumor tissues.

Subject(s)

Databases, Genetic , Gene Expression Profiling , Neoplasms , Humans , Genome , Neoplasms/genetics , Transcriptome

5.

HALL: a comprehensive database for human aging and longevity studies.

Li, Hao; Wu, Song; Li, Jiaming; Xiong, Zhuang; Yang, Kuan; Ye, Weidong; Ren, Jie; Wang, Qiaoran; Xiong, Muzhao; Zheng, Zikai; Zhang, Shuo; Han, Zichu; Yang, Peng; Jiang, Beier; Ping, Jiale; Zuo, Yuesheng; Lu, Xiaoyong; Zhai, Qiaocheng; Yan, Haoteng; Wang, Si; Ma, Shuai; Zhang, Bing; Ye, Jinlin; Qu, Jing; Yang, Yun-Gui; Zhang, Feng; Liu, Guang-Hui; Bao, Yiming; Zhang, Weiqi.

Nucleic Acids Res ; 52(D1): D909-D918, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37870433

ABSTRACT

Diverse individuals age at different rates and display variable susceptibilities to tissue aging, functional decline and aging-related diseases. Centenarians, exemplifying extreme longevity, serve as models for healthy aging. The field of human aging and longevity research is rapidly advancing, garnering significant attention and accumulating substantial data in recent years. Omics technologies, encompassing phenomics, genomics, transcriptomics, proteomics, metabolomics and microbiomics, have provided multidimensional insights and revolutionized cohort-based investigations into human aging and longevity. Accumulated data, covering diverse cells, tissues and cohorts across the lifespan necessitates the establishment of an open and integrated database. Addressing this, we established the Human Aging and Longevity Landscape (HALL), a comprehensive multi-omics repository encompassing a diverse spectrum of human cohorts, spanning from young adults to centenarians. The core objective of HALL is to foster healthy aging by offering an extensive repository of information on biomarkers that gauge the trajectory of human aging. Moreover, the database facilitates the development of diagnostic tools for aging-related conditions and empowers targeted interventions to enhance longevity. HALL is publicly available at https://ngdc.cncb.ac.cn/hall/index.

Subject(s)

Aging , Databases, Factual , Longevity , Multiomics , Aged, 80 and over , Humans , Young Adult , Aging/genetics , Biomarkers , Disease Susceptibility , Genomics , Longevity/genetics

6.

TargetGene: a comprehensive database of cell-type-specific target genes for genetic variants.

Lin, Shiqi; Wu, Song; Zhao, Wei; Fang, Zhanjie; Kang, Hongen; Liu, Xinxuan; Pan, Siyu; Yu, Fudong; Bao, Yiming; Jia, Peilin.

Nucleic Acids Res ; 52(D1): D1072-D1081, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37870478

ABSTRACT

Annotating genetic variants to their target genes is of great importance in unraveling the causal variants and genetic mechanisms that underlie complex diseases. However, disease-associated genetic variants are often located in non-coding regions and manifest context-specific effects, making it challenging to accurately identify the target genes and regulatory mechanisms. Here, we present TargetGene (https://ngdc.cncb.ac.cn/targetgene/), a comprehensive database reporting target genes for human genetic variants from various aspects. Specifically, we collected a comprehensive catalog of multi-omics data at the single-cell and bulk levels and from various human tissues, cell types and developmental stages. To facilitate the identification of Single Nucleotide Polymorphism (SNP)-to-gene connections, we have implemented multiple analytical tools based on chromatin co-accessibility, 3D interaction, enhancer activities and quantitative trait loci, among others. We applied the pipeline to evaluate variants from nearly 1300 Genome-wide association studies (GWAS) and assembled a comprehensive atlas of multiscale regulation of genetic variants. TargetGene is equipped with user-friendly web interfaces that enable intuitive searching, navigation and browsing through the results. Overall, TargetGene provides a unique resource to empower researchers to study the regulatory mechanisms of genetic variants in complex human traits.

Subject(s)

Databases, Genetic , Genome-Wide Association Study , Quantitative Trait Loci , Humans , Chromatin/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide

7.

PPGR: a comprehensive perennial plant genomes and regulation database.

Yang, Sen; Zong, Wenting; Shi, Lingling; Li, Ruisi; Ma, Zhenshu; Ma, Shubao; Si, Jingna; Wu, Zhijing; Zhai, Jinglan; Ma, Yingke; Fan, Zhuojing; Chen, Sisi; Huang, Huahong; Zhang, Deqiang; Bao, Yiming; Li, Rujiao; Xie, Jianbo.

Nucleic Acids Res ; 52(D1): D1588-D1596, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37933857

ABSTRACT

Perennial woody plants hold vital ecological significance, distinguished by their unique traits. While significant progress has been made in their genomic and functional studies, a major challenge persists: the absence of a comprehensive reference platform for collection, integration and in-depth analysis of the vast amount of data. Here, we present PPGR (Resource for Perennial Plant Genomes and Regulation; https://ngdc.cncb.ac.cn/ppgr/) to address this critical gap, by collecting, integrating, analyzing and visualizing genomic, gene regulation and functional data of perennial plants. PPGR currently includes 60 species, 847 million protein-protein/TF (transcription factor)-target interactions, 9016 transcriptome samples under various environmental conditions and genetic backgrounds. Noteworthy is the focus on genes that regulate wood production, seasonal dormancy, terpene biosynthesis and leaf senescence representing a wealth of information derived from experimental data, literature mining, public databases and genomic predictions. Furthermore, PPGR incorporates a range of multi-omics search and analysis tools to facilitate browsing and application of these extensive datasets. PPGR represents a comprehensive and high-quality resource for perennial plants, substantiated by an illustrative case study that demonstrates its capacity in unraveling gene functions and shedding light on potential regulatory processes.

Subject(s)

Databases, Genetic , Genome, Plant , Genomics , Plants/genetics , Transcriptome

8.

The P10K database: a data portal for the protist 10 000 genomes project.

Gao, Xinxin; Chen, Kai; Xiong, Jie; Zou, Dong; Yang, Fangdian; Ma, Yingke; Jiang, Chuanqi; Gao, Xiaoxuan; Wang, Guangying; Gu, Siyu; Zhang, Peng; Luo, Shuai; Huang, Kaiyao; Bao, Yiming; Zhang, Zhang; Ma, Lina; Miao, Wei.

Nucleic Acids Res ; 52(D1): D747-D755, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37930867

ABSTRACT

Protists, a highly diverse group of microscopic eukaryotic organisms distinct from fungi, animals and plants, exert crucial roles within the earth's biosphere. However, the genomes of only a small fraction of known protist species have been published and made publicly accessible. To address this constraint, the Protist 10 000 Genomes Project (P10K) was initiated, implementing a specialized pipeline for single-cell genome/transcriptome assembly, decontamination and annotation of protists. The resultant P10K database (https://ngdc.cncb.ac.cn/p10k/) serves as a comprehensive platform, collating and disseminating genome sequences and annotations from diverse protist groups. Currently, the P10K database has incorporated 2959 genomes and transcriptomes, including 1101 newly sequenced datasets by P10K and 1858 publicly available datasets. Notably, it covers 45% of the protist orders, with a significant representation (53% coverage) of ciliates, featuring nearly a thousand genomes/transcriptomes. Intriguingly, analysis of the unique codon table usage among ciliates has revealed differences compared to the NCBI taxonomy system, suggesting a need to revise the codon tables used for these species. Collectively, the P10K database serves as a valuable repository of genetic resources for protist research and aims to expand its collection by incorporating more sequenced data and advanced analysis tools to benefit protist studies worldwide.

Subject(s)

Databases, Genetic , Eukaryota , Fungi , Genome , Animals , Codon , Eukaryota/genetics , Fungi/genetics , Plants/genetics

9.

From BIG Data Center to China National Center for Bioinformation.

Bao, Yiming; Xue, Yongbiao.

Genomics Proteomics Bioinformatics ; 21(5): 900-903, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37832784

Subject(s)

Big Data , Genomics , China

10.

CCLHunter: An efficient toolkit for cancer cell line authentication.

Bu, Congfan; Zheng, Xinchang; Mai, Jialin; Nie, Zhi; Zeng, Jingyao; Qian, Qiheng; Xu, Tianyi; Sun, Yanling; Bao, Yiming; Xiao, Jingfa.

Comput Struct Biotechnol J ; 21: 4675-4682, 2023.

Article in English | MEDLINE | ID: mdl-37841327

ABSTRACT

Cancer cell lines are essential in cancer research, yet accurate authentication of these cell lines can be challenging, particularly for consanguineous cell lines with close genetic similarities. We introduce a new Cancer Cell Line Hunter (CCLHunter) method to tackle this challenge. This approach utilizes the information of single nucleotide polymorphisms, expression profiles, and kindred topology to authenticate 1389 human cancer cell lines accurately. CCLHunter can precisely and efficiently authenticate cell lines from consanguineous lineages and those derived from other tissues of the same individual. Our evaluation results indicate that CCLHunter has a complete accuracy rate of 93.27%, with an accuracy of 89.28% even for consanguineous cell lines, outperforming existing methods. Additionally, we provide convenient access to CCLHunter through standalone software and a web server at https://ngdc.cncb.ac.cn/cclhunter.

11.

RCoV19: A One-stop Hub for SARS-CoV-2 Genome Data Integration, Variant Monitoring, and Risk Pre-warning.

Li, Cuiping; Ma, Lina; Zou, Dong; Zhang, Rongqin; Bai, Xue; Li, Lun; Wu, Gangao; Huang, Tianhao; Zhao, Wei; Jin, Enhui; Bao, Yiming; Song, Shuhui.

Genomics Proteomics Bioinformatics ; 21(5): 1066-1079, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37898309

ABSTRACT

The Resource for Coronavirus 2019 (RCoV19) is an open-access information resource dedicated to providing valuable data on the genomes, mutations, and variants of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In this updated implementation of RCoV19, we have made significant improvements and advancements over the previous version. Firstly, we have implemented a highly refined genome data curation model. This model now features an automated integration pipeline and optimized curation rules, enabling efficient daily updates of data in RCoV19. Secondly, we have developed a global and regional lineage evolution monitoring platform, alongside an outbreak risk pre-warning system. These additions provide a comprehensive understanding of SARS-CoV-2 evolution and transmission patterns, enabling better preparedness and response strategies. Thirdly, we have developed a powerful interactive mutation spectrum comparison module. This module allows users to compare and analyze mutation patterns, assisting in the detection of potential new lineages. Furthermore, we have incorporated a comprehensive knowledgebase on mutation effects. This knowledgebase serves as a valuable resource for retrieving information on the functional implications of specific mutations. In summary, RCoV19 serves as a vital scientific resource, providing access to valuable data, relevant information, and technical support in the global fight against COVID-19. The complete contents of RCoV19 are available to the public at https://ngdc.cncb.ac.cn/ncov/.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Knowledge Bases , Mutation

12.

A Comprehensive Benchmark of Transcriptomic Biomarkers for Immune Checkpoint Blockades.

Kang, Hongen; Zhu, Xiuli; Cui, Ying; Xiong, Zhuang; Zong, Wenting; Bao, Yiming; Jia, Peilin.

Cancers (Basel) ; 15(16)2023 Aug 14.

Article in English | MEDLINE | ID: mdl-37627121

ABSTRACT

Immune checkpoint blockades (ICBs) have revolutionized cancer therapy by inducing durable clinical responses, but only a small percentage of patients can benefit from ICB treatments. Many studies have established various biomarkers to predict ICB responses. However, different biomarkers were found with diverse performances in practice, and a timely and unbiased assessment has yet to be conducted due to the complexity of ICB-related studies and trials. In this study, we manually curated 29 published datasets with matched transcriptome and clinical data from more than 1400 patients, and uniformly preprocessed these datasets for further analyses. In addition, we collected 39 sets of transcriptomic biomarkers, and based on the nature of the corresponding computational methods, we categorized them into the gene-set-like group (with the self-contained design and the competitive design, respectively) and the deconvolution-like group. Next, we investigated the correlations and patterns of these biomarkers and utilized a standardized workflow to systematically evaluate their performance in predicting ICB responses and survival statuses across different datasets, cancer types, antibodies, biopsy times, and combinatory treatments. In our benchmark, most biomarkers showed poor performance in terms of stability and robustness across different datasets. Two scores (TIDE and CYT) had a competitive performance for ICB response prediction, and two others (PASS-ON and EIGS_ssGSEA) showed the best association with clinical outcome. Finally, we developed ICB-Portal to host the datasets, biomarkers, and benchmark results and to implement the computational methods for researchers to test their custom biomarkers. Our work provided valuable resources and a one-stop solution to facilitate ICB-related research.

13.

McAN: a novel computational algorithm and platform for constructing and visualizing haplotype networks.

Li, Lun; Xu, Bo; Tian, Dongmei; Wang, Anke; Zhu, Junwei; Li, Cuiping; Li, Na; Zhao, Wei; Shi, Leisheng; Xue, Yongbiao; Zhang, Zhang; Bao, Yiming; Zhao, Wenming; Song, Shuhui.

Brief Bioinform ; 24(3)2023 05 19.

Article in English | MEDLINE | ID: mdl-37170752

ABSTRACT

Haplotype networks are graphs used to represent evolutionary relationships between a set of taxa and are characterized by intuitiveness in analyzing genealogical relationships of closely related genomes. We here propose a novel algorithm termed McAN that considers mutation spectrum history (mutations in ancestry haplotype should be contained in descendant haplotype), node size (corresponding to sample count for a given node) and sampling time when constructing haplotype network. We show that McAN is two orders of magnitude faster than state-of-the-art algorithms without losing accuracy, making it suitable for analysis of a large number of sequences. Based on our algorithm, we developed an online web server and offline tool for haplotype network construction, community lineage determination, and interactive network visualization. We demonstrate that McAN is highly suitable for analyzing and visualizing massive genomic data and is helpful to enhance the understanding of genome evolution. Availability: Source code is written in C/C++ and available at https://github.com/Theory-Lun/McAN and https://ngdc.cncb.ac.cn/biocode/tools/BT007301 under the MIT license. Web server is available at https://ngdc.cncb.ac.cn/bit/hapnet/. SARS-CoV-2 dataset are available at https://ngdc.cncb.ac.cn/ncov/. Contact: songshh@big.ac.cn (Song S), zhaowm@big.ac.cn (Zhao W), baoym@big.ac.cn (Bao Y), zhangzhang@big.ac.cn (Zhang Z), ybxue@big.ac.cn (Xue Y).

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , Haplotypes , SARS-CoV-2/genetics , COVID-19/genetics , Algorithms , Genomics , Software

14.

MACdb: A Curated Knowledgebase for Metabolic Associations across Human Cancers.

Sun, Yanling; Zheng, Xinchang; Wang, Guoliang; Wang, Yibo; Chen, Xiaoning; Sun, Jiani; Xiong, Zhuang; Zhang, Sisi; Wang, Tianyi; Fan, Zhuojing; Bu, Congfan; Bao, Yiming; Zhao, Wenming.

Mol Cancer Res ; 21(7): 691-697, 2023 07 05.

Article in English | MEDLINE | ID: mdl-37027007

ABSTRACT

Cancer is one of the leading causes of human death. As metabolomics techniques become more and more widely used in cancer research, metabolites are increasingly recognized as crucial factors in both cancer diagnosis and treatment. In this study, we developed MACdb (https://ngdc.cncb.ac.cn/macdb), a curated knowledgebase to recruit the metabolic associations between metabolites and cancers. Unlike conventional data-driven resources, MACdb integrates cancer-metabolic knowledge from extensive publications, providing high quality metabolite associations and tools to support multiple research purposes. In the current implementation, MACdb has integrated 40,710 cancer-metabolite associations, covering 267 traits from 17 categories of cancers with high incidence or mortality, based entirely on manual curation from 1,127 studies reported in 462 publications (screened from 5,153 research papers). MACdb offers intuitive browsing functions to explore associations at multi-dimensions (metabolite, trait, study, and publication), and constructs knowledge graph to provide overall landscape among cancer, trait, and metabolite. Furthermore, NameToCid (map metabolite name to PubChem Cid) and Enrichment tools are developed to help users enrich the association of metabolites with various cancer types and traits. IMPLICATION: MACdb paves an informative and practical way to evaluate cancer-metabolite associations and has a great potential to help researchers identify key predictive metabolic markers in cancers.

Subject(s)

Neoplasms , Humans , Neoplasms/genetics , Metabolomics/methods , Knowledge Bases

15.

Four principles to establish a universal virus taxonomy.

Simmonds, Peter; Adriaenssens, Evelien M; Zerbini, F Murilo; Abrescia, Nicola G A; Aiewsakun, Pakorn; Alfenas-Zerbini, Poliane; Bao, Yiming; Barylski, Jakub; Drosten, Christian; Duffy, Siobain; Duprex, W Paul; Dutilh, Bas E; Elena, Santiago F; García, Maria Laura; Junglen, Sandra; Katzourakis, Aris; Koonin, Eugene V; Krupovic, Mart; Kuhn, Jens H; Lambert, Amy J; Lefkowitz, Elliot J; Lobocka, Malgorzata; Lood, Cédric; Mahony, Jennifer; Meier-Kolthoff, Jan P; Mushegian, Arcady R; Oksanen, Hanna M; Poranen, Minna M; Reyes-Muñoz, Alejandro; Robertson, David L; Roux, Simon; Rubino, Luisa; Sabanadzovic, Sead; Siddell, Stuart; Skern, Tim; Smith, Donald B; Sullivan, Matthew B; Suzuki, Nobuhiro; Turner, Dann; Van Doorslaer, Koenraad; Vandamme, Anne-Mieke; Varsani, Arvind; Vasilakis, Nikos.

PLoS Biol ; 21(2): e3001922, 2023 02.

Article in English | MEDLINE | ID: mdl-36780432

ABSTRACT

A universal taxonomy of viruses is essential for a comprehensive view of the virus world and for communicating the complicated evolutionary relationships among viruses. However, there are major differences in the conceptualisation and approaches to virus classification and nomenclature among virologists, clinicians, agronomists, and other interested parties. Here, we provide recommendations to guide the construction of a coherent and comprehensive virus taxonomy, based on expert scientific consensus. Firstly, assignments of viruses should be congruent with the best attainable reconstruction of their evolutionary histories, i.e., taxa should be monophyletic. This fundamental principle for classification of viruses is currently included in the International Committee on Taxonomy of Viruses (ICTV) code only for the rank of species. Secondly, phenotypic and ecological properties of viruses may inform, but not override, evolutionary relatedness in the placement of ranks. Thirdly, alternative classifications that consider phenotypic attributes, such as being vector-borne (e.g., "arboviruses"), infecting a certain type of host (e.g., "mycoviruses," "bacteriophages") or displaying specific pathogenicity (e.g., "human immunodeficiency viruses"), may serve important clinical and regulatory purposes but often create polyphyletic categories that do not reflect evolutionary relationships. Nevertheless, such classifications ought to be maintained if they serve the needs of specific communities or play a practical clinical or regulatory role. However, they should not be considered or called taxonomies. Finally, while an evolution-based framework enables viruses discovered by metagenomics to be incorporated into the ICTV taxonomy, there are essential requirements for quality control of the sequence data used for these assignments. Combined, these four principles will enable future development and expansion of virus taxonomy as the true evolutionary diversity of viruses becomes apparent.

Subject(s)

Bacteriophages , Viruses , Humans , Metagenomics , Phylogeny , Viruses/genetics

16.

ASCancer Atlas: a comprehensive knowledgebase of alternative splicing in human cancers.

Wu, Song; Huang, Yue; Zhang, Mochen; Gong, Zheng; Wang, Guoliang; Zheng, Xinchang; Zong, Wenting; Zhao, Wei; Xing, Peiqi; Li, Rujiao; Liu, Zhaoqi; Bao, Yiming.

Nucleic Acids Res ; 51(D1): D1196-D1204, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36318242

ABSTRACT

Alternative splicing (AS) is a fundamental process that governs almost all aspects of cellular functions, and dysregulation in this process has been implicated in tumor initiation, progression and treatment resistance. With accumulating studies of carcinogenic mis-splicing in cancers, there is an urgent demand to integrate cancer-associated splicing changes to better understand their internal cross-talks and functional consequences from a global view. However, a resource of key functional AS events in human cancers is still lacking. To fill the gap, we developed ASCancer Atlas (https://ngdc.cncb.ac.cn/ascancer), a comprehensive knowledgebase of aberrant splicing in human cancers. Compared to extant databases, ASCancer Atlas features a high-confidence collection of 2006 cancer-associated splicing events experimentally proved to promote tumorigenesis, a systematic splicing regulatory network, and a suit of multi-scale online analysis tools. For each event, we manually curated the functional axis including upstream splicing regulators, splicing event annotations, downstream oncogenic effects, and possible therapeutic strategies. ASCancer Atlas also houses about 2 million computationally putative splicing events. Additionally, a user-friendly web interface was built to enable users to easily browse, search, visualize, analyze, and download all splicing events. Overall, ASCancer Atlas provides a unique resource to study the functional roles of splicing dysregulation in human cancers.

Subject(s)

Alternative Splicing , Databases, Genetic , Neoplasms , Humans , Alternative Splicing/genetics , Databases, Factual , Neoplasms/genetics , RNA Splicing , Atlases as Topic

17.

MethBank 4.0: an updated database of DNA methylation across a variety of species.

Zhang, Mochen; Zong, Wenting; Zou, Dong; Wang, Guoliang; Zhao, Wei; Yang, Fei; Wu, Song; Zhang, Xinran; Guo, Xutong; Ma, Yingke; Xiong, Zhuang; Zhang, Zhang; Bao, Yiming; Li, Rujiao.

Nucleic Acids Res ; 51(D1): D208-D216, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36318250

ABSTRACT

DNA methylation, as the most intensively studied epigenetic mark, regulates gene expression in numerous biological processes including development, aging, and disease. With the rapid accumulation of whole-genome bisulfite sequencing data, integrating, archiving, analyzing, and visualizing those data becomes critical. Since its first publication in 2015, MethBank has been continuously updated to include more DNA methylomes across more diverse species. Here, we present MethBank 4.0 (https://ngdc.cncb.ac.cn/methbank/), which reports an increase of 309% in data volume, with 1449 single-base resolution methylomes of 23 species, covering 236 tissues/cell lines and 15 biological contexts. Value-added information, such as more rigorous quality evaluation, more standardized metadata, and comprehensive downstream annotations have been integrated in the new version. Moreover, expert-curated knowledge modules of featured differentially methylated genes associated with biological contexts and methylation analysis tools have been incorporated as new components of MethBank. In addition, MethBank 4.0 is equipped with a series of new web interfaces to browse, search, and visualize DNA methylation profiles and related information. With all these improvements, we believe the updated MethBank 4.0 will serve as a fundamental resource to provide a wide range of data services for the global research community.

Subject(s)

DNA Methylation , Databases, Genetic , Epigenomics , Databases, Factual , Epigenome , Sequence Analysis, DNA , Whole Genome Sequencing

18.

HGD: an integrated homologous gene database across multiple species.

Duan, Guangya; Wu, Gangao; Chen, Xiaoning; Tian, Dongmei; Li, Zhaohua; Sun, Yanling; Du, Zhenglin; Hao, Lili; Song, Shuhui; Gao, Yuan; Xiao, Jingfa; Zhang, Zhang; Bao, Yiming; Tang, Bixia; Zhao, Wenming.

Nucleic Acids Res ; 51(D1): D994-D1002, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36318261

ABSTRACT

Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.

Subject(s)

Databases, Genetic , Animals , Molecular Sequence Annotation

19.

Brain Catalog: a comprehensive resource for the genetic landscape of brain-related traits.

Pan, Siyu; Kang, Hongen; Liu, Xinxuan; Lin, Shiqi; Yuan, Na; Zhang, Zhang; Bao, Yiming; Jia, Peilin.

Nucleic Acids Res ; 51(D1): D835-D844, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36243988

ABSTRACT

A broad range of complex phenotypes are related to dysfunctions in brain (hereafter referred to as brain-related traits), including various mental and behavioral disorders and diseases of the nervous system. These traits in general share overlapping symptoms, pathogenesis, and genetic components. Here, we present Brain Catalog (https://ngdc.cncb.ac.cn/braincatalog), a comprehensive database aiming to delineate the genetic components of more than 500 GWAS summary statistics datasets for brain-related traits from multiple aspects. First, Brain Catalog provides results of candidate causal variants, causal genes, and functional tissues and cell types for each trait identified by multiple methods using comprehensive annotation datasets (58 QTL datasets spanning 6 types of QTLs). Second, Brain Catalog estimates the SNP-based heritability, the partitioning heritability based on functional annotations, and genetic correlations among traits. Finally, through bidirectional Mendelian randomization analyses, Brain Catalog presents inference of risk factors that are likely causal to each trait. In conclusion, Brain Catalog presents a one-stop shop for the genetic components of brain-related traits, potentially serving as a valuable resource for worldwide researchers to advance the understanding of how GWAS signals may contribute to the biological etiology of brain-related traits.

Subject(s)

Brain , Databases, Genetic , Mental Disorders , Brain/physiopathology , Phenotype , Quantitative Trait Loci , Mental Disorders/genetics

20.

Cell Taxonomy: a curated repository of cell types with multifaceted characterization.

Jiang, Shuai; Qian, Qiheng; Zhu, Tongtong; Zong, Wenting; Shang, Yunfei; Jin, Tong; Zhang, Yuansheng; Chen, Ming; Wu, Zishan; Chu, Yuan; Zhang, Rongqin; Luo, Sicheng; Jing, Wei; Zou, Dong; Bao, Yiming; Xiao, Jingfa; Zhang, Zhang.

Nucleic Acids Res ; 51(D1): D853-D860, 2023 Jan 06.

Article in English | MEDLINE | ID: mdl-36161321

ABSTRACT

Single-cell studies have delineated cellular diversity and uncovered increasing numbers of previously uncharacterized cell types in complex tissues. Thus, synthesizing growing knowledge of cellular characteristics is critical for dissecting cellular heterogeneity, developmental processes and tumorigenesis at single-cell resolution. Here, we present Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of â¼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers. Taken together, Cell Taxonomy represents a fundamentally useful reference to systematically and accurately characterize cell types and thus lays an important foundation for deeply understanding and exploring cellular biology in diverse species.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL